Search CORE

18 research outputs found

Natural Language Processing and E-Government: Extracting Reusable Crime Report Information

Author: Iriberri Alicia, \u2706
Leroy Gondy A.
Publication venue: Scholarship @ Claremont
Publication date: 01/01/2007
Field of study

Crime reporting needs to be possible 24/7. Although 911 and tip-lines are the most publicized reporting mechanisms, several other options exist, ranging from in-person reporting to online submissions. Internet-based crime reporting systems allow victims and witnesses of crime to report incidents to police 24/7 from any location. However, these existing e-mail and text-based systems provide little support for witnesses\u27 memory recall leading to reports with less information and lower accuracy. These systems also do not facilitate reuse and integration of the reported information with other information systems. We are developing an anonymous Online Crime Reporting System that is designed to extract relevant crime information from witness\u27 narratives and to ask additional questions based on that information. We leverage natural language processing and investigative interviewing techniques to support memory recall and map the information directly to a database to support information reuse. We report on the evaluation of the Suspect Description Module (SDM) of the system. Our interface captures 70% (recall) of information from witness narratives with 100% precision. Additional modules will follow the design and development methods used with this module

Scholarship@Claremont

Crossref

Reporting On-Campus Crime Online: User Intention to Use

Author: Garrett Nathan
Iriberri Alicia, \u2706
Leroy Gondy A.
Publication venue: Scholarship @ Claremont
Publication date: 01/01/2006
Field of study

National surveys demonstrate that millions of crimes go unreported in the United States. Several reasons may contribute to this lack of reporting and we are investigating these potential reasons and how they may be addressed. We are developing an online system that provides an anonymous and secure mechanism for both victims and witnesses to report crimes to police. The system is being implemented and tested on a university campus. Potential users (i.e., students, staff) were surveyed to determine their intent to use the system. Respondents claimed to report crimes already, which is in contrast with the findings from the national surveys. Our respondents found the online system useful, accessible, and safe to report crime, but the type of crime and the urgency of response is a determinant in the decision to use the system versus reporting it to a live person

Scholarship@Claremont

Crossref

Data Mining Techniques to Study Therapy Success with Autistic Children

Author: Charlop Marjorie H.
Irmscher Annika
Leroy Gondy A.
Publication venue: Scholarship @ Claremont
Publication date: 01/01/2006
Field of study

Autism spectrum disorder has become one of the most prevalent developmental disorders, characterized by a wide variety of symptoms. Many children need extensive therapy for years to improve their behavior and facilitate integration in society. However, few systematic evaluations are done on a large scale that can provide insights into how, where, and how therapy has an impact. We describe how data mining techniques can be used to provide insights into behavioral therapy as well as its effect on participants. To this end, we are developing a digital library of coded video segments that contains data on appropriate and inappropriate behavior of autistic children in different social settings during different stages of therapy and. In general, we found that therapy increased appropriate behavior and decreased inappropriate behavior. Decision trees and association rules provided more detailed insights for high and low levels of appropriate and inappropriate behavior. We found that a child\u27s interaction with a parent or therapist led to especially high levels of appropriate behavior and behavior is most predictable while therapy is in progress

Scholarship@Claremont

Crime Information Extraction from Police and Witness Narrative Reports

Author: Iriberri Alicia, \u2706
Ku Chih Hao, \u2712
Leroy Gondy A.
Publication venue: Scholarship @ Claremont
Publication date: 01/01/2008
Field of study

To solve crimes, investigators often rely on interviews with witnesses, victims, or criminals themselves. The interviews are transcribed and the pertinent data is contained in narrative form. To solve one crime, investigators may need to interview multiple people and then analyze the narrative reports. There are several difficulties with this process: interviewing people is time consuming, the interviews - sometimes conducted by multiple officers - need to be combined, and the resulting information may still be incomplete. For example, victims or witnesses are often too scared or embarrassed to report or prefer to remain anonymous. We are developing an online reporting system that combines natural language processing with insights from the cognitive interview approach to obtain more information from witnesses and victims. We report here on information extraction from police and witness narratives. We achieved high precision, 94% and 96%, and recall, 85% and 90%, for both narrative types

Scholarship@Claremont

Crossref

A Classifier to Evaluate Language Specificity in Medical Documents

Author: Chatterjee Samir
Fan Jie
Leroy Gondy A.
Miller Trudi, \u2708
Thoms Brian, \u2709
Publication venue: Scholarship @ Claremont
Publication date: 01/01/2007
Field of study

Consumer health information written by health care professionals is often inaccessible to the consumers it is written for. Traditional readability formulas examine syntactic features like sentence length and number of syllables, ignoring the target audience\u27s grasp of the words themselves. The use of specialized vocabulary disrupts the understanding of patients with low reading skills, causing a decrease in comprehension. A naive Bayes classifier for three levels of increasing medical terminology specificity (consumer/patient, novice health learner, medical professional) was created with a lexicon generated from a representative medical corpus. Ninety-six percent accuracy in classification was attained. The classifier was then applied to existing consumer health web pages. We found that only 4% of pages were classified at a layperson level, regardless of the Flesch reading ease scores, while the remaining pages were at the level of medical professionals. This indicates that consumer health web pages are not using appropriate language for their target audience

Scholarship@Claremont

Non-verbal Communication with Autistic Children Using Digital Libraries

Author: Charlop Marjorie H.
Chuang Serena, \u2705
Huang John, \u2705
Leroy Gondy A.
Publication venue: Scholarship @ Claremont
Publication date: 01/01/2005
Field of study

Autism spectrum disorder (ASD) has become one of the most prevalent mental disorders over the last few years and its prevalence is still growing. The disorder is characterized by a wide variety of symptoms such as lack of social behavior, extreme withdrawal, and problems communicating. Because of the diversity in symptoms and the wide variety in severity for those, each autistic child has different needs and requires individualized therapy. This leads to long waiting lists for therapy

Scholarship@Claremont

Recommended from our members

Facilitating knowledge discovery by integrating bottom-up and top-down knowledge sources: A text mining approach

Author: Leroy Gondy A.
Leroy Gondy A.
Publication venue: The University of Arizona.
Publication date: 01/01/2003
Field of study

This dissertation aims to discover synergistic combinations of top-down (ontologies), interactive (relevance feedback), and bottom-up (machine learning) knowledge encoding techniques for text mining. The strength of machine learning techniques lies in their coverage and efficiency because they can discover new knowledge without human intervention. The output, however, is often imprecise and irrelevant. Human knowledge, top-down or interactively encoded, may remedy this. The research question addressed is if knowledge discovery can become more precise and relevant with hybrid systems. Three different combinations are evaluated. The first study investigates an ontology, the Unified Medical Language System (UMLS), combined with an automatically created thesaurus to dynamically adjust the thesaurus' output. The augmented thesaurus was added to a medical, meta-search portal as a keyword suggester and compared with the unmodified thesaurus and UMLS. Users preferred the hybrid approach. Thus, the combination of the ontology with the thesaurus was better than the components separately. The second study investigates implicit relevance feedback combined with genetic algorithms designed to adjust user queries for online searching. These were compared with pure relevance feedback algorithms. Users were divided into groups based on their overall performance. The genetic algorithm significantly helped low achievers, but hindered high achievers. Thus, the interactively elicited knowledge from relevance feedback was judged insufficient to guide machine learning for all users. The final study investigates ontologies combined with two natural language processing techniques: a shallow parser and an automatically created thesaurus. Both capture relations between phrases in biomedical text. Qualified researchers found all terms to be precise; however, terms that belonged to ontologies were more relevant. Parser relations were all precise. Thesaurus relations were less precise, but precision improved for relations that had their terms represented in ontologies. Thus, this integration of ontologies with natural language processing provided good results. In general, it was concluded that top-down encoded knowledge could be effectively integrated with bottom-up encoded knowledge for knowledge discovery in text. This is particularly relevant to business fields, which are text and knowledge intensive. In the future, it will be worthwhile to extend the parser and also to test similar hybrid approaches for data mining

The University of Arizona

Programs for machine learning

Author: Gondy Leroy A
Naïve Bayes
Thomas C. Rindflesch B
Publication venue: Morgan Kaufmann
Publication date
Field of study

learning algorithms on word sense disambiguation with small dataset

CiteSeerX

Using Symbolic Knowledge in the UMLS to Disambiguate Words in Small Datasets with a Naïve Bayes Classifier.

Author: Gondy Leroy A
M. Fieschi Et Al. (eds
Thomas C. Rindflesch B
Publication venue
Publication date
Field of study

Current approaches to word sense disambiguation use and combine various machine-learning techniques. Most refer to characteristics of the ambiguous word and surrounding words and are based on hundreds of examples. Unfortunately, developing large training sets is time-consuming. We investigate the use of symbolic knowledge to augment machine-learning techniques for small datasets. UMLS semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. A naïve Bayes classifier was trained for 15 words with 100 examples for each. The most frequent sense of a word served as the baseline. The effect of increasingly accurate symbolic knowledge was evaluated in eight experimental conditions. Performance was measured by accuracy based on 10-fold cross-validation. The best condition used only the semantic types of the words in the sentence. Accuracy was then on average 10 % higher than the baseline; however, it varied from 8 % deterioration to 29 % improvement. In a follow-up evaluation, we noted a trend that the best disambiguation was found for words that were the least troublesome to the human evaluators. Keywords: Artificial intelligence, machine learning, naïve Bayes, wor

CiteSeerX